Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Compilation of dictionaries for semantic attribute analysis of television news captions

Identifieur interne : 001733 ( Main/Exploration ); précédent : 001732; suivant : 001734

Compilation of dictionaries for semantic attribute analysis of television news captions

Auteurs : Ichiro Ide [Japon] ; Reiko Hamada [Japon] ; Shuichi Sakai [Japon] ; Hidehiko Tanaka [Japon]

Source :

RBID : ISTEX:C0C0323E3979ABF70A5D71C305052E35B494F8AA

English descriptors

Abstract

With the increase in the amount of video that is broadcast daily, there is an increasing need for storage of video in a systematic way for future reuse and retrieval. In particular, from the viewpoint of importance and usability, it is desirable to index news videos. For adequate automatic indexing based on the text information in the video, it is not sufficient to apply the simple index extraction and annotation methods which have been widely used in conventional methods. It is important to select index candidates with reference to semantic attributes. The purpose of this study is to compile dictionaries which are needed for analyzing the semantic attributes of captions (noun phrases) in TV news videos. We describe the process by which words are extracted from text corpora and a thesaurus for storage on the basis of specified conditions. The quality of the dictionaries is examined by analysis of the semantic attributes of the words appearing in actual news videos, and the results are presented. In evaluation experiments in which an existing proper noun dictionary and temporal noun dictionary were combined and used, a recall of 79 to 93% and a precision of 41 to 71% were obtained. Although the precision is low in this result, it is concluded that the compiled dictionaries are of practical use for indexing since the recall is more important in that case. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(12): 32–44, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10417

Url:
DOI: 10.1002/scj.10417


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Compilation of dictionaries for semantic attribute analysis of television news captions</title>
<author>
<name sortKey="Ide, Ichiro" sort="Ide, Ichiro" uniqKey="Ide I" first="Ichiro" last="Ide">Ichiro Ide</name>
</author>
<author>
<name sortKey="Hamada, Reiko" sort="Hamada, Reiko" uniqKey="Hamada R" first="Reiko" last="Hamada">Reiko Hamada</name>
</author>
<author>
<name sortKey="Sakai, Shuichi" sort="Sakai, Shuichi" uniqKey="Sakai S" first="Shuichi" last="Sakai">Shuichi Sakai</name>
</author>
<author>
<name sortKey="Tanaka, Hidehiko" sort="Tanaka, Hidehiko" uniqKey="Tanaka H" first="Hidehiko" last="Tanaka">Hidehiko Tanaka</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:C0C0323E3979ABF70A5D71C305052E35B494F8AA</idno>
<date when="2003" year="2003">2003</date>
<idno type="doi">10.1002/scj.10417</idno>
<idno type="url">https://api.istex.fr/document/C0C0323E3979ABF70A5D71C305052E35B494F8AA/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000736</idno>
<idno type="wicri:Area/Istex/Curation">000728</idno>
<idno type="wicri:Area/Istex/Checkpoint">000F06</idno>
<idno type="wicri:doubleKey">0882-1666:2003:Ide I:compilation:of:dictionaries</idno>
<idno type="wicri:Area/Main/Merge">001810</idno>
<idno type="wicri:Area/Main/Curation">001733</idno>
<idno type="wicri:Area/Main/Exploration">001733</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Compilation of dictionaries for semantic attribute analysis of television news captions</title>
<author>
<name sortKey="Ide, Ichiro" sort="Ide, Ichiro" uniqKey="Ide I" first="Ichiro" last="Ide">Ichiro Ide</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>National Institute of Informatics, Tokyo</wicri:regionArea>
<placeName>
<settlement type="city">Tokyo</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Hamada, Reiko" sort="Hamada, Reiko" uniqKey="Hamada R" first="Reiko" last="Hamada">Reiko Hamada</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Graduate School of Engineering, The University of Tokyo, Tokyo</wicri:regionArea>
<placeName>
<settlement type="city">Tokyo</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Sakai, Shuichi" sort="Sakai, Shuichi" uniqKey="Sakai S" first="Shuichi" last="Sakai">Shuichi Sakai</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Graduate School of Information Science and Technology, The University of Tokyo, Tokyo</wicri:regionArea>
<placeName>
<settlement type="city">Tokyo</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Tanaka, Hidehiko" sort="Tanaka, Hidehiko" uniqKey="Tanaka H" first="Hidehiko" last="Tanaka">Hidehiko Tanaka</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Graduate School of Information Science and Technology, The University of Tokyo, Tokyo</wicri:regionArea>
<placeName>
<settlement type="city">Tokyo</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Systems and Computers in Japan</title>
<title level="j" type="abbrev">Syst. Comp. Jpn.</title>
<idno type="ISSN">0882-1666</idno>
<idno type="eISSN">1520-684X</idno>
<imprint>
<publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2003-11-15">2003-11-15</date>
<biblScope unit="volume">34</biblScope>
<biblScope unit="issue">12</biblScope>
<biblScope unit="page" from="32">32</biblScope>
<biblScope unit="page" to="44">44</biblScope>
</imprint>
<idno type="ISSN">0882-1666</idno>
</series>
<idno type="istex">C0C0323E3979ABF70A5D71C305052E35B494F8AA</idno>
<idno type="DOI">10.1002/scj.10417</idno>
<idno type="ArticleID">SCJ10417</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0882-1666</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>caption</term>
<term>dictionary</term>
<term>indexing</term>
<term>semantic attribute</term>
<term>suffix noun</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">With the increase in the amount of video that is broadcast daily, there is an increasing need for storage of video in a systematic way for future reuse and retrieval. In particular, from the viewpoint of importance and usability, it is desirable to index news videos. For adequate automatic indexing based on the text information in the video, it is not sufficient to apply the simple index extraction and annotation methods which have been widely used in conventional methods. It is important to select index candidates with reference to semantic attributes. The purpose of this study is to compile dictionaries which are needed for analyzing the semantic attributes of captions (noun phrases) in TV news videos. We describe the process by which words are extracted from text corpora and a thesaurus for storage on the basis of specified conditions. The quality of the dictionaries is examined by analysis of the semantic attributes of the words appearing in actual news videos, and the results are presented. In evaluation experiments in which an existing proper noun dictionary and temporal noun dictionary were combined and used, a recall of 79 to 93% and a precision of 41 to 71% were obtained. Although the precision is low in this result, it is concluded that the compiled dictionaries are of practical use for indexing since the recall is more important in that case. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(12): 32–44, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10417</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Japon</li>
</country>
<settlement>
<li>Tokyo</li>
</settlement>
</list>
<tree>
<country name="Japon">
<noRegion>
<name sortKey="Ide, Ichiro" sort="Ide, Ichiro" uniqKey="Ide I" first="Ichiro" last="Ide">Ichiro Ide</name>
</noRegion>
<name sortKey="Hamada, Reiko" sort="Hamada, Reiko" uniqKey="Hamada R" first="Reiko" last="Hamada">Reiko Hamada</name>
<name sortKey="Sakai, Shuichi" sort="Sakai, Shuichi" uniqKey="Sakai S" first="Shuichi" last="Sakai">Shuichi Sakai</name>
<name sortKey="Tanaka, Hidehiko" sort="Tanaka, Hidehiko" uniqKey="Tanaka H" first="Hidehiko" last="Tanaka">Hidehiko Tanaka</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001733 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001733 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:C0C0323E3979ABF70A5D71C305052E35B494F8AA
   |texte=   Compilation of dictionaries for semantic attribute analysis of television news captions
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024